An Efficient Text Summarizer using Lexical Chains

نویسندگان

  • H. Gregory Silber
  • Kathleen F. McCoy
چکیده

We present a system which uses lexical chains as an intermediate representation for automatic text summarization. This system builds on previous research by implementing a lexical chain extraction algorithm in linear time. The system is reasonably domain independent and takes as input any text or HTML document. The system outputs a short summary based on the most salient concepts from the original document. The length of the extracted summary can be either controlled automatically, or manually based on length or percentage of compression. While still under development, the system provides useful summaries which compare well in information content to human generated summaries. Additionally, the system provides a robust test bed for future summary generation research. 1 I n t r o d u c t i o n Automatic text summarization has long been viewed as a two-step process. First, an intermediate representation of the summary must be created. Second, a natural language representation of the summary must be generated using the intermediate representation(Sparek Jones, 1993). Much of the early research in automatic text summarization has involved generation of the intermediate representation. The natural language generation problem has only recently received substantial attention in the context of summarization.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Text Summarization Using Lexical Chains

Text summarization addresses both the problem of selecting the most important portions of text and the problem of generating coherent summaries. We present in this paper the summarizer of the University of Lethbridge at DUC 2001, which is based on an efficient use of lexical chains.

متن کامل

An EÆcient Text Summarizer Using Lexical Chains

We present a system which uses lexical chains as an intermediate representation for automatic text summarization. This system builds on previous research by implementing a lexical chain extraction algorithm in linear time. The system is reasonably domain independent and takes as input any text or HTML document. The system outputs a short summary based on the most salient concepts from the origi...

متن کامل

IS_SUM: A Multi-Document Summarizer based on Document Index Graphic and Lexical Chains

IS_SUM is a summarizer developed at Institute of Software (IS) of Chinese Academy of Sciences for DUC 2005. We adopt a new way for clustering and summarizing documents by integrating Document Index Graphic (DIG) [7] with Lexical Chains [5]. Our results show the benefit of integrating DIG with Lexical Chains.

متن کامل

Cohesion and coherence for Automatic Summarization

This paper presents the integration of cohesive properties of text with coherence relations, to obtain an adequate representation of text for automatic summarization. A summarizer based on Lexical Chains is enchanced with rhetorical and argumentative structure obtained via Discourse Markers. When evaluated with newspaper corpus, this integration yields only slight improvement in the resulting s...

متن کامل

Integrating cohesion and coherence for Automatic Summarization

This paper presents the integration of cohesive properties of text with coherence relations, to obtain an adequate representation of text for automatic summarization. A summarizer based on Lexical Chains is enchanced with rhetorical and argumentative structure obtained via Discourse Markers. When evaluated with newspaper corpus, this integration yields only slight improvement in the resulting s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000